Aditya Verma
08/17/2025, 7:36 PMcontroller:
persistence:
enabled: true
accessMode: ReadWriteOnce
size: 1Gi # Small since it's only a staging area
mountPath: /var/pinot/controller/data
storageClass: gp3
extraVolumes: []
extraVolumeMounts: []
data:
dir: /var/pinot/controller/data
# S3 DeepStorage config
config:
controller.data.dir: "/var/pinot/controller/data"
pinot.controller.storage.factory.class.s3: org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.local.temp.dir: "/var/pinot/temp"
# S3 bucket path where segments are permanently stored
pinot.controller.segment.fetcher.protocols: s3
pinot.controller.segment.fetcher.s3.class: org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.segment.uploader.class: org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.segment.store.uri: "<s3://my-pinot-segments>"
Rajat
08/18/2025, 11:47 AMBalkar Lathwal
08/18/2025, 3:18 PMmat
08/18/2025, 5:51 PMfield is not null
the indexed null correctly registers the field as null. But the length of the array is still 1 and still contains a literal string. This is making string matching a little wonky as there is always a value in the array to be matched. So if a user tried to check if an array contains the text 'null' it returns a value instead of nothing.
Is there a way to set my default for MV fields to an empty array to avoid this string matching issue?
I know I can just add field is not null
to all my queries, but I want to see if there is a better answer I am not seeing.Aditya Verma
08/18/2025, 9:11 PM{
...
"dedupConfig": {
"dedupEnabled": true,
"hashFunction": "NONE",
"dedupTimeColumn": "mtime",
"metadataTTL": 30000
},
...
}
San Kumar
08/19/2025, 6:19 AMAlexander Maniates
08/19/2025, 7:09 PMcreateMetadataTarGz
for segment creation and preferMetadataTarGz
for the metadata push job. In testing this, I ran into errors on the controller side where the DefaultMetadataExtractor expects a segment tar ball to contain a /v3
subdir, so it fails when it receives just the slim tar ball which only contains [metadata.properties, creation.meta]
and not the /v3/
subDir
[1] [2] [3] [4]
I have written up a fix that we are testing on our end: https://github.com/apache/pinot/pull/16635
I am curious if we might be doing something wrong on our end, or if folks are using a custom HadoopSegmentCreationMapper that is making a different structured tar ball for the segment metadata? Or are folks implementing a different MetadataExtractor
for their own use? We are still on 1.2.0, but it looks like the code hasn't changed much around this.Jonathan Baxter
08/21/2025, 5:08 PM"segmentIngestionType": "REFRESH",
, but as I'm generating the data with BigQuery, the number of output partfiles may vary occasionally, and I'm seeing that if we export 16 files on day 1, it'll load 16 segments.... but if we export 15 files on day 2, it'll only replace 15 segments and leave the 16th segment from day 1, leaving duplicate/bad data in the table.
are there nice ways of handling this problem? My plan was to set up a task that reads the existing number of segments and manually deletes any additional segments, but there would still be some small amount of time where there would be duplicate data so if there are better ways, I'm all earsRohini Choudhary
08/21/2025, 7:10 PM"pool": {
"DefaultTenant_REALTIME": "1"
}
},
"listFields": {
"TAG_LIST": [
"DefaultTenant_OFFLINE",
"DefaultTenant_REALTIME"
]
But while creating the table we are getting below error:
{
"code": 500,
"error": "Index 3 out of bounds for length 3"
}
Our Table config looks like below:
{
"REALTIME": {
"tableName": "otel_spans_REALTIME",
"tableType": "REALTIME",
"segmentsConfig": {
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "1",
"segmentPushType": "APPEND",
"timeColumnName": "startTimeUnixMilli",
"minimizeDataMovement": false,
"schemaName": "otel_spans",
"replication": "3",
"completionConfig": {
"completionMode": "DOWNLOAD"
}
},
"instanceAssignmentConfigMap": {
"CONSUMING": {
"partitionSelector": "FD_AWARE_INSTANCE_PARTITION_SELECTOR",
"tagPoolConfig": {
"tag": "DefaultTenant_REALTIME",
"poolBased": true
},
"replicaGroupPartitionConfig": {
"replicaGroupBased": true,
"numReplicaGroups": 3
}
}
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant",
"tagOverrideConfig": {}
},
"tableIndexConfig": {
"rangeIndexVersion": 2,
"loadMode": "MMAP",
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false,
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"aggregateMetrics": false,
"nullHandlingEnabled": false,
"columnMajorSegmentBuilderEnabled": true,
"optimizeDictionary": false,
"optimizeDictionaryForMetrics": false,
"noDictionarySizeRatioThreshold": 0.85,
"noDictionaryColumns": [
"traceId",
"spanId",
"parentSpanId",
"resourceAttributes",
"attributes",
"startTimeUnixMilli",
"endTimeUnixMilli",
"statusMessage",
"events"
],
"invertedIndexColumns": [
"serviceName",
"name",
"statusCode"
],
"bloomFilterColumns": [
"traceId"
],
"onHeapDictionaryColumns": [],
"rangeIndexColumns": [
"duration"
],
"sortedColumn": [
"startTimeUnixMilli"
],
"varLengthDictionaryColumns": []
},
"metadata": {},
"quota": {},
"routing": {
"instanceSelectorType": "replicaGroup",
"segmentPrunerTypes": [
"time"
]
},
"query": {},
"fieldConfigList": [
{
"name": "startTimeUnixMilli",
"encodingType": "RAW",
"indexTypes": [],
"indexes": {
"forward": {
"compressionCodec": "ZSTANDARD",
"deriveNumDocsPerChunk": false,
"rawIndexWriterVersion": 4
}
},
"tierOverwrites": null
},
{
"name": "endTimeUnixMilli",
"encodingType": "RAW",
"indexTypes": [],
"indexes": {
"forward": {
"compressionCodec": "ZSTANDARD",
"deriveNumDocsPerChunk": false,
"rawIndexWriterVersion": 4
}
},
"tierOverwrites": null
},
{
"name": "traceId",
"encodingType": "RAW",
"indexTypes": [],
"indexes": {
"forward": {
"compressionCodec": "ZSTANDARD",
"deriveNumDocsPerChunk": false,
"rawIndexWriterVersion": 4
}
},
"tierOverwrites": null
},
{
"name": "spanId",
"encodingType": "RAW",
"indexTypes": [],
"indexes": {
"forward": {
"compressionCodec": "ZSTANDARD",
"deriveNumDocsPerChunk": false,
"rawIndexWriterVersion": 4
}
},
"tierOverwrites": null
},
{
"name": "parentSpanId",
"encodingType": "RAW",
"indexTypes": [],
"indexes": {
"forward": {
"compressionCodec": "ZSTANDARD",
"deriveNumDocsPerChunk": false,
"rawIndexWriterVersion": 4
}
},
"tierOverwrites": null
},
{
"name": "resourceAttributes",
"encodingType": "RAW",
"indexTypes": [],
"indexes": {
"forward": {
"compressionCodec": "ZSTANDARD",
"deriveNumDocsPerChunk": false,
"rawIndexWriterVersion": 4
}
},
"tierOverwrites": null
},
{
"name": "events",
"encodingType": "RAW",
"indexTypes": [],
"indexes": {
"forward": {
"compressionCodec": "ZSTANDARD",
"deriveNumDocsPerChunk": false,
"rawIndexWriterVersion": 4
}
},
"tierOverwrites": null
},
{
"name": "attributes",
"encodingType": "RAW",
"indexTypes": [],
"indexes": {
"forward": {
"compressionCodec": "ZSTANDARD",
"deriveNumDocsPerChunk": false,
"rawIndexWriterVersion": 4
},
"json": {
"compressionCodec": "ZSTANDARD",
"maxLevels": 1,
"excludeArray": true,
"disableCrossArrayUnnest": true,
"includePaths": null,
"excludePaths": null,
"excludeFields": null,
"indexPaths": null
}
},
"tierOverwrites": null
}
],
"ingestionConfig": {
"streamIngestionConfig": {
"streamConfigMaps": [
{
"streamType": "kafka",
"stream.kafka.topic.name": "flattened_spans",
"stream.kafka.broker.list": "kafka:9092",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.time": "30m",
"realtime.segment.flush.threshold.segment.size": "300M",
"realtime.segment.serverUploadToDeepStore": "true"
}
]
},
"continueOnError": false,
"rowTimeValueCheck": false,
"segmentTimeValueCheck": true
},
"isDimTable": false
}
}
Does anyone have any idea? The error is not pretty clear, what is wrong with schema.
One more observation is that if we only apply pool based instance assignment by removing "partitionSelector": "FD_AWARE_INSTANCE_PARTITION_SELECTOR",
part from the config, it works properly.madhulika
08/22/2025, 4:34 AMBalkar Lathwal
08/25/2025, 10:20 AMmadhulika
08/26/2025, 9:48 AMprasanna
08/26/2025, 2:13 PMZEBIN KANG
08/26/2025, 9:45 PM{
"name": "request_ts",
"dataType": "TIMESTAMP",
"format": "1:SECONDS:EPOCH",
"granularity": "1:SECONDS"
},
to improve the index, we also doing
1. "timeColumnName": "request_ts",
2. "rangeIndexColumns": ["request_ts"],
3. "routing": {"segmentPrunerTypes": ["time"]}
4. add request_ts to timestampConfig with granularities like ["DAY","WEEK","MONTH"]
Could you please share if the operation is helpful or some of them does not improve the performance too much 🙇
cc: @Neeraja Sridharan @Sai Tarun Tadakamallamadhulika
08/27/2025, 4:31 AMmadhulika
08/28/2025, 3:53 AMSoon
08/28/2025, 2:25 PMAs explained in the forward index section, a column that is both sorted and equipped with a dictionary is encoded in a specialized manner that serves the purpose of implementing both forward and inverted indexes. Consequently, when these conditions are met, an inverted index is effectively created without additional configuration, even if the configuration suggests otherwise.
We have a column that is dictionary enabled and configured as sorted in realtime table. We have also set autoGeneratedInvertedIndex
and createInvertedIndexDuringSegmentGeneration
as true
in the table config. However we are not seeing inverted index being used in the explain plan query. Would inverted index also be configured in the table config to see it in effect?raghav
08/28/2025, 3:13 PMRuntimeException: Caught exception while running BloomFilterSegmentPruner
(caused by TimeoutException
in QueryMultiThreadingUtils.runTasksWithDeadline
)
• RuntimeException: Caught exception while running CombinePlanNode
(also TimeoutException
)
These errors appear all the time, not just under peak load. We recently increased server RAM, but otherwise no config changes. Unfortunately, I don’t have older logs to check if this was happening before.
Has anyone seen similar behavior, and what could cause it to affect only a subset of servers?madhulika
08/29/2025, 4:05 AMVatsal Agrawal
08/29/2025, 5:28 AMDeepak Padhi
08/29/2025, 10:04 AMDeepak Padhi
08/29/2025, 10:04 AMRajkumar
08/30/2025, 6:47 PMRajkumar
08/30/2025, 6:47 PM{
"tableName": "kafka_test_1",
"tableType": "REALTIME",
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant",
"tagOverrideConfig": {}
},
"segmentsConfig": {
"timeColumnName": "time",
"replication": "1",
"replicasPerPartition": "1",
"retentionTimeUnit": null,
"retentionTimeValue": null,
"completionConfig": null,
"crypterClassName": null,
"peerSegmentDownloadScheme": null,
"schemaName": "kafka_test"
},
"tableIndexConfig": {
"loadMode": "MMAP",
"invertedIndexColumns": [],
"createInvertedIndexDuringSegmentGeneration": false,
"rangeIndexColumns": [],
"sortedColumn": [],
"bloomFilterColumns": [],
"bloomFilterConfigs": null,
"noDictionaryColumns": [],
"onHeapDictionaryColumns": [],
"varLengthDictionaryColumns": [],
"enableDefaultStarTree": false,
"starTreeIndexConfigs": null,
"enableDynamicStarTreeCreation": false,
"segmentPartitionConfig": null,
"columnMinMaxValueGeneratorMode": null,
"aggregateMetrics": false,
"nullHandlingEnabled": false,
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.topic.name": "PINOT.TEST",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.broker.list": "{}",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka30.KafkaConsumerFactory",
"stream.kafka.security.protocol": "SASL_SSL",
"stream.kafka.sasl.mechanism": "OAUTHBEARER",
"stream.kafka.sasl.login.callback.handler.class": "org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginCallbackHandler",
"stream.kafka.sasl.oauthbearer.token.endpoint.url": "{url}",
"stream.kafka.sasl.jaas.config": "org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required clientId='{}' clientSecret='{}' scope='' extension_logicalCluster='{}' extension_identityPoolId='{}';",
"stream.kafka.ssl.endpoint.identification.algorithm": "https",
"stream.kafka.consumer.prop.group.id": "{}",
"stream.kafka.consumer.prop.auto.offset.reset": "earliest",
"<http://stream.kafka.consumer.prop.request.timeout.ms|stream.kafka.consumer.prop.request.timeout.ms>": "60000",
"<http://stream.kafka.consumer.prop.metadata.max.age.ms|stream.kafka.consumer.prop.metadata.max.age.ms>": "60000",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaAvroMessageDecoder",
"stream.kafka.decoder.prop.schema.registry.url": "https://{}.westeurope.azure.confluent.cloud",
"stream.kafka.decoder.prop.schema.registry.basic.auth.credentials.source": "USER_INFO",
"<http://stream.kafka.decoder.prop.schema.registry.basic.auth.user.info|stream.kafka.decoder.prop.schema.registry.basic.auth.user.info>": "{key}:{secret}"
}
},
"metadata": {},
"ingestionConfig": {
"filterConfig": null,
"transformConfigs": null
},
"quota": {
"storage": null,
"maxQueriesPerSecond": null
},
"task": null,
"routing": {
"segmentPrunerTypes": null,
"instanceSelectorType": null
},
"query": {
"timeoutMs": null
},
"fieldConfigList": null,
"upsertConfig": null,
"tierConfigs": null
}
Rajkumar
09/01/2025, 10:53 AMRajkumar
09/01/2025, 10:55 AM"streamType": "kafka",
"stream.kafka.topic.name": "asdas",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.broker.list": "asasds.westeurope.azure.confluent.cloud:9092",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"security.protocol": "SASL_SSL",
"sasl.mechanism": "PLAIN",
"sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";",
"ssl.endpoint.identification.algorithm": "https",
"auto.offset.reset": "earliest",
"<http://stream.kafka.consumer.prop.request.timeout.ms|stream.kafka.consumer.prop.request.timeout.ms>": "60000",
"<http://stream.kafka.consumer.prop.metadata.max.age.ms|stream.kafka.consumer.prop.metadata.max.age.ms>": "60000",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
"stream.kafka.decoder.prop.schema.registry.rest.url": "<https://dasdsa.westeurope.azure.confluent.cloud>",
"stream.kafka.decoder.prop.schema.registry.basic.auth.credentials.source": "USER_INFO",
"<http://stream.kafka.decoder.prop.schema.registry.basic.auth.user.info|stream.kafka.decoder.prop.schema.registry.basic.auth.user.info>": ":",
"stream.kafka.decoder.prop.schema.registry.schema.name": "KsqlDataSourceSchema",
"stream.kafka.decoder.prop.format": "AVRO"
Mayank
Naveen
09/02/2025, 3:37 PMRajkumar
09/02/2025, 4:48 PMRajkumar
09/02/2025, 4:48 PMsplit(PEXP_DEAL_KEY, '|', 1)